Rework Linker dispatching for cross-major nvJitLink/driver skew by cpcloud · Pull Request #1911 · NVIDIA/cuda-python

cpcloud · 2026-04-14T21:05:10Z

Summary

Replaces module-level "decide once" backend selection with per-Linker-instance dispatch at __init__ time
Factors decision into pure _choose_backend() helper for GPU-free unit testing
Handles nvJitLink/driver major-version mismatches: falls back to driver linker for non-LTO linking, raises RuntimeError for LTO when backends are incompatible
Probes driver_version() lazily — environments with nvJitLink but no driver (build containers) still work
_probe_nvjitlink() cached, warns at most once when nvJitLink is absent

Breaking change: options.link_time_optimization=True with nvJitLink absent now raises RuntimeError instead of silently passing CU_JIT_LTO to the driver (which was not real LTO linking).

Decision matrix

driver	nvJitLink	ltoir input	lto/ptx	result
any	None	no	no	driver
any	None	yes/lto	—	raise
M	(M,*)	any	any	nvJitLink
D≠N	(N,*)	no	no	driver fallback
D≠N	(N,*)	yes/lto	—	raise
None	available	any	any	nvJitLink

Test plan

GPU-free parameterized tests for full decision matrix (test_linker_dispatch.py)
Test helpers handle driver-version failure gracefully
CI: existing GPU tests pass with per-instance dispatch
CI: cross-major behavior verified (requires multiple CTK versions)

Closes #712

🤖 Generated with Claude Code

github-actions · 2026-04-14T21:19:30Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1911/
https://nvidia.github.io/cuda-python/pr-preview/pr-1911/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1911/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1911/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

Replace the module-level "decide once, use everywhere" nvJitLink-vs-driver choice with a per-Linker-instance decision that considers the CUDA driver major version, nvJitLink's availability and major version, the input code types, and whether link-time optimization is requested. The dispatch is factored into a pure helper `_choose_backend()` that is fully unit-testable without a GPU. Its decision matrix: - no nvJitLink, no LTO -> driver - matching majors -> nvJitLink - cross-major, no LTO -> driver (nvJitLink output may not be loadable) - LTO + no nvJitLink -> RuntimeError - LTO + cross-major -> RuntimeError This resolves the cross-major-driver scenario described in NVIDIA#712, where an nvJitLink 12.x may produce a CUBIN the driver 13.x (or vice versa) cannot load. The previous code committed to nvJitLink unconditionally when it was importable. Tests: - `tests/test_linker_dispatch.py` parametrizes the entire matrix against `_choose_backend()` with mocked versions (no GPU, no driver required). - `tests/test_linker.py::TestLinkerDispatch` drives the same decision through the real `Linker` constructor via monkeypatched version probes. - `tests/test_optional_dependency_imports.py` is updated to exercise the new `_probe_nvjitlink()` helper in place of the removed `_decide_nvjitlink_or_driver()`. - `tests/test_program.py` and `tests/test_linker.py` use a small local helper to compute the effective backend for the current environment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

driver_version() was called unconditionally during Linker.__init__, which fails in environments where nvJitLink is installed but the CUDA driver is absent (e.g., build containers). Now catches the exception and sets driver_major=None. When driver_major is unknown and nvJitLink is available, optimistically selects the nvJitLink backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Test helpers calling driver_version() at module scope would crash in no-driver environments before test collection. Mirror the production lazy-probe pattern: catch exceptions and pass None. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Linker_link was nulling self._drv_log_bufs right after cuLinkComplete, releasing the bytearrays whose addresses were handed to the driver via CU_JIT_INFO_LOG_BUFFER and CU_JIT_ERROR_LOG_BUFFER at cuLinkCreate time. The CUlinkState retains those pointers until cuLinkDestroy, which runs during Linker tp_dealloc. Freeing the bytearrays first left the driver with dangling pointers and corrupted the heap; subsequent CUDA calls (e.g. NVRTC compilation in the next test fixture) segfaulted. This path became reachable in CI with the new per-instance backend dispatch: CTK 12.9.1 + driver 13.0 runners now hit the driver linker for cross-major linking, which was never exercised before. Retain _drv_log_bufs until the cdef class is deallocated; pxd declaration order ensures _culink_handle (and therefore cuLinkDestroy) runs before the bytearrays are cleared.

…etime The CUDA driver docs state: "optionValues must remain valid for the life of the CUlinkState if output options are used." The driver writes log- fill sizes (output) back into the optionValues slots for CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES and CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES. Linker_init previously declared c_jit_keys/c_jit_values as local cdef vector[...] on the stack of Linker_init; they were destroyed when the function returned, leaving the driver with dangling writes during subsequent cuLinkAddData/cuLinkComplete/cuLinkDestroy calls. This was always latent. It became reachable with the per-instance backend dispatch (CTK 12.9.1 runners now select the driver linker when they pair with a driver 13 install), and only manifested on driver 13 as heap corruption that killed the next NVRTC or link call. Promote the two arrays to cdef class fields declared after _culink_handle in the pxd. Cython's tp_dealloc destroys C++ fields in pxd declaration order, so the vectors are destroyed after the shared_ptr deleter runs cuLinkDestroy. The cuda.bindings high-level wrapper (driver.cuLinkCreate) already handles this by attaching a keepalive to CUlinkState; cuda.core's low-level cydriver.cuLinkCreate path did not. Also drop the now-unused void_ptr ctypedef.

The as_bytes() method raises ValueError for unsupported backends (per its docstring and matching the test directly above this one). The driver-backend skip-guarded test was asserting RuntimeError, so it always failed on CTK 12.9.1 runners where the skip condition does not apply.

Adds parametrized cases for the build-container path where cuDriverGetVersion is unqueryable: with nvJitLink present the dispatcher picks nvjitlink optimistically; with nvJitLink absent it falls back to driver for non-LTO and raises for LTO. These paths are documented in _choose_backend's contract but were previously uncovered.

cpcloud added this to the cuda.core v1.0.0 milestone Apr 14, 2026

cpcloud added enhancement Any code-related improvements P0 High priority - Must do! cuda.core Everything related to the cuda.core module breaking Breaking changes are introduced labels Apr 14, 2026

cpcloud self-assigned this Apr 14, 2026

cpcloud force-pushed the linker-dispatch-rework-712 branch 2 times, most recently from 0c94703 to 5d8fa24 Compare April 16, 2026 21:58

cpcloud requested review from leofang and mdboom April 17, 2026 13:07

cpcloud force-pushed the linker-dispatch-rework-712 branch from 61ea4ff to a259b8d Compare April 17, 2026 15:54

cpcloud and others added 8 commits April 18, 2026 05:44

Fix SPDX header and ruff format in test_linker_dispatch.py

24dfea0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cpcloud force-pushed the linker-dispatch-rework-712 branch from a259b8d to 53541a6 Compare April 18, 2026 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework Linker dispatching for cross-major nvJitLink/driver skew#1911

Rework Linker dispatching for cross-major nvJitLink/driver skew#1911
cpcloud wants to merge 8 commits intoNVIDIA:mainfrom
cpcloud:linker-dispatch-rework-712

cpcloud commented Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cpcloud commented Apr 14, 2026

Summary

Decision matrix

Test plan

Uh oh!

github-actions bot commented Apr 14, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant